An 80/20 Data Quality Law for Professional Scientometrics?

نویسندگان

  • Andreas Strotmann
  • Dangzhi Zhao
چکیده

Scientometric network error consequences Only very recently have researchers begun looking at what concrete effect the errors in a network model caused by name ambiguities in the data sources may have on the results of popular types of network analysis. The results that they report are quite alarming in the aggregate: not only do typical evaluative analyses of individuals (e.g., citation rankings) suffer significantly from these errors, but there is mounting evidence that even the most basic statistical features of realistic large-scale networks are hugely distorted by ambiguities. Strotmann et al. (2009), for example, document significant distortions in co-authorship network visualizations, and Diesner and Carley (2013) report that “minor changes in accuracy rates of [name disambiguation] lead to comparatively huge changes in network metrics, while the set [of] top-scoring key entities is highly robust. Co-occurrence based link formation entails a small chance of false negatives, but the rate of false positives is alarmingly high.” In fact, Fegley and Torvik (2013) go so far as to dismiss one of the most famous recent results in large-scale social network analysis, the exact power-law distribution from preferential attachment (Barabási & Albert, 1999), at least in the case of scientific collaboration networks (Barabási et al., 2002), as a mere artefact produced by a lack of name disambiguation in the underlying dataset! The ultimate irony here is that Fegley and Torvik’s (2013) data are consistent with an interpretation that Barabási's cooperation network power may have been induced by a power law distribution of name ambiguities rather than co-authorships. Similarly, Strotmann and Zhao (2013) find that even highly stable statistical analysis methods of author co-citation analysis fail in the face of largescale ambiguity errors in the underlying dataset. While for evaluative bibliometrics the most serious problem is generally the “splitting” of individuals, i.e., the failure to recognize each and every one of an individual’s contributions correctly (especially of high-performing individuals), Fegley and Torvik (2013) find that splitting is not the main concern in relational network analysis. Instead, they and Strotmann and Zhao (2013) both find that it is the erroneous “merging” of individuals, i.e., the failure to separate the contributions of multiple individuals correctly because their names are too similar, that causes major distortions of large-scale network analysis results in relational network analysis. Especially East Asian names are prone to extreme amounts of merging. While in European cultures there are relatively few common given names but a large variety of family names, in Chinese, Korean and other East Asian cultures the opposite is the case—a small number of surnames is shared by half their populations, but given names are much more varied. The old tradition in scientific publishing to list authors by their surnames and initials works, sort-of, when science is done in European-origin cultures, but all bibliographic databases have in recent years had to move to a full-name model as research boomed in the Asian Tiger nations (e.g., PubMed/MEDLINE in 2002).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Lotkaian Informetrics for Ranking in Digital Libraries

The purpose of this paper is to propose the use of models, theories and laws in bibliometrics and scientometrics to enhance information retrieval processes, especially ranking. A common pattern in many man-made data sets is Lotka’s Law which follows the well-known power-law distributions. These informetric distributions can be used to give an alternative order to large and scattered result sets...

متن کامل

The ADS in the Information Age - Impact on Discovery

The SAO/NASA Astrophysics Data System (ADS) grew up with and has been riding the waves of the Information Age, closely monitoring and anticipating the needs of its end-users. By now, all professional astronomers are using the ADS on a daily basis, and a substantial fraction have been using it for their entire professional career. In addition to being an indispensable tool for professional scien...

متن کامل

An Investigation into Indices of Professional Ethics for Faculty Members Using FAHP

Background: Every organizational unit requires ethical codes, also known as professional ethics, in compliance with its professional structure. Therefore, this study was conducted to identify and rate the indices of professional ethics for faculty members using the fuzzy analytical hierarchy process (FAHP). Method: The statistical population of this descriptive-survey study included faculty mem...

متن کامل

Nursing Students’ Perception of Ethical and Professional Characteristics of an Ideal Faculty Member: A Qualitative Study

Introduction: One of the factors affecting in promoting educational quality is the properties of the faculty members. As consumers, students are stakeholders of educational products of the faculty. Since receiving feedback from customers is one of the substantial steps for promoting service quality, it is necessary to perform a deep examination of the issue to explicate new dimensions of an ide...

متن کامل

کیفیت زندگی حرفه‌ای پرستاران بیمارستان‌های شهرستان تربت‌حیدریه در سال 1395

Background & Aim: Quality of professional life is a critical concept which is related to personal characteristics and workplace of individuals. Also it is an important issue for health system and health care givers. There is a little information about the quality of professional life among health staff in the country. This study carried out to assess the quality of professional life among nurse...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015